Thread: [C] Reading a file using realloc

  1. #1
    Registered User
    Join Date
    Feb 2011
    Posts
    5

    Question [C] Reading a file using realloc

    Hello everyone,

    I'm doing to skip the introduction and just get to the code:

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    char *output = NULL;
    int CharInFile = 0;
    FILE * Read;
    
    // main here. calls readfile after opening file.
    
    int readfile (FILE * Read)
    {
    int c = 0;
    int x;
    
        do {
            c = 0;
            c = fgetc (Read);
            if (c != EOF) {       
                output = (char *) realloc (output, (CharInFile + 1) * sizeof(char));
                output[CharInFile] = c; printf("%c", output[CharInFile]);
            } 
            CharInFile++;
        } while (c != EOF);
        printf("%s\n", output); 
        return 0;
    }
    Now, my intention with this code is to print out the file's content twice, once in a loop:
    Code:
    printf("%c", output[CharInFile]);
    (This one works fine.)

    and the other at the end
    Code:
    printf("%s\n", output);
    The second one doesn't work. It prints out just the newline.

    The reason I want to do this is I want to use string.h for functions such as strstr which don't work with single characters (and no, I'd prefer not to write a make-shift function to make it to). It doesn't need to use realloc, so long as the whole file is loaded.


    Thanks in advance.

  2. #2
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    It works fine for me. There's a slight bug: you're incrementing CharInFile even if c is EOF. The output I get is:
    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    char *output = NULL;
    int CharInFile = 0;
    FILE * Read;
    
    // main here. calls readfile after opening file.
    
    int readfile (FILE * Read)
    {
    int c = 0;
    int x;
    
        do {
            c = 0;
            c = fgetc (Read);
            if (c != EOF) {       
                output = (char *) realloc (output, (CharInFile + 1) * sizeof(char));
                output[CharInFile] = c; printf("%c", output[CharInFile]);
                CharInFile++;
            } 
        } while (c != EOF);
        printf("%s\n", output); 
        return 0;
    }
    
    int main(void)
    {
      Read = fopen("foobar.c", "r");
      readfile(Read);
      fclose(Read);
      return 0;
    }
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    char *output = NULL;
    int CharInFile = 0;
    FILE * Read;
    
    // main here. calls readfile after opening file.
    
    int readfile (FILE * Read)
    {
    int c = 0;
    int x;
    
        do {
            c = 0;
            c = fgetc (Read);
            if (c != EOF) {       
                output = (char *) realloc (output, (CharInFile + 1) * sizeof(char));
                output[CharInFile] = c; printf("%c", output[CharInFile]);
                CharInFile++;
            } 
        } while (c != EOF);
        printf("%s\n", output); 
        return 0;
    }
    
    int main(void)
    {
      Read = fopen("foobar.c", "r");
      readfile(Read);
      fclose(Read);
      return 0;
    }
    If you understand what you're doing, you're not learning anything.

  3. #3
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    The problem with using character reads and constant reallocs is that for a file of any size it will start off ok but then get slower and slower as the read proceeds... If realloc doesn't have enough clear memory it has to do a malloc then copy which takes time, especially when working byte by byte.

    A far more efficient method (without calling OS specific directory reads) is to...

    Open your file.
    Use fseek() set the drive heads at the end of the file ( fseek(File,0,SEEK_END); )
    Then ftell (File) to get the file size.
    Reset to the beginning of the file with fseek()
    Do a single calloc() call to get memory for the file
    Do a single fread() to load the entire file
    Close your file.

    Now you have a memory buffer with your entire file in it...
    Last edited by CommonTater; 02-20-2011 at 08:23 PM.

  4. #4
    Registered User
    Join Date
    Feb 2011
    Posts
    5
    Thanks itsme, it works well now (no idea why it didn't before however). (Thanks for pointing out that small bug as well.)

    I'll look into doing so CommonTater, I'll probably end up implanting it as I plan to read relatively large files.

  5. #5
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Asym View Post
    I'll look into doing so CommonTater, I'll probably end up implanting it as I plan to read relatively large files.
    Yep, far more efficient than byte by byte or even line by line... just get the whole thing and work on it in memory. Save your files the same way... point fwrite() at the buffer and write the whole thing in one call.

    I commonly use a similar process using Windows API calls and a 150meg file will load up in just a couple of seconds.

  6. #6
    Registered User
    Join Date
    Feb 2011
    Posts
    5

    Thumbs up

    Quote Originally Posted by CommonTater View Post
    Yep, far more efficient than byte by byte or even line by line... just get the whole thing and work on it in memory. Save your files the same way... point fwrite() at the buffer and write the whole thing in one call.

    I commonly use a similar process using Windows API calls and a 150meg file will load up in just a couple of seconds.
    Alright, so I've got it working. It's a lot smaller and simpler than I thought it'd be too. Thanks for the help!

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    No worries.

    Just remember to error check the returns from fopen(), calloc(), and fread()... In paticular you need to watch for any difference between the file size by ftell() and the returned byte count from fread() to make sure you've got the entire file.

    One other thing... (forgot to mention this the first time) when you call calloc() to reserve memory, if you are using string functions on it, take a few extra bytes so you are sure the file buffer is NULL terminated.

    Code:
    char *LoadFile(char * Filename)
      { FILE *file;            // file handle
         int fsize;            // size of the file (good for 2 gigs)
         char *buffer;         // file data
    
         file = fopen(Filename,"rb");
         if (!file)
           return NULL;
         fseek(file, 0, SEEK_END);
         fsize = ftell(file);
         fseek(file, 0, SEEK_SET);
         buffer = calloc(fsize + 8, 1);  
         if (!buffer)
           { fclose(file);
             return NULL; }
         fsize -= fread(buffer, 1, fsize, file);
         if (fsize)
           {fclose[file]; 
             return NULL; }
         fclose(file);
         return buffer; }
    And... don't forget to free() the buffer when you're done with it.
    Last edited by CommonTater; 02-20-2011 at 09:53 PM. Reason: silly me forgot to close the file....

  8. #8
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    Btw, are you sure that you are really getting newline char translated properly when you read file as binary?
    Edit:
    Last edited by Bayint Naung; 02-21-2011 at 01:41 AM.

  9. #9
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Bayint Naung View Post
    Btw, are you sure that you are really getting newline char translated properly when you read file as binary?
    Edit:
    Doesn't matter... the goal is to get an image of the file into memory so you can parse and translate it there.

  10. #10
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    Doesn't matter... the goal is to get an image of the file into memory so you can parse and translate it there.
    I passed the buffer to parse() function and it says unknown char '\r'.
    Last edited by Bayint Naung; 02-21-2011 at 10:32 AM. Reason: added grin!

  11. #11
    Registered User
    Join Date
    Feb 2011
    Posts
    5
    I passed the buffer to parse() function and it says unknown char '\r'.
    Then don't use Windows. ;D
    Also, what parse function are you using? I couldn't find anything under man parse.

    (The file is being opened as binary to ensure ftell returns the exact number.)

    No worries.

    Just remember to error check the returns from fopen(), calloc(), and fread()... In paticular you need to watch for any difference between the file size by ftell() and the returned byte count from fread() to make sure you've got the entire file.

    One other thing... (forgot to mention this the first time) when you call calloc() to reserve memory, if you are using string functions on it, take a few extra bytes so you are sure the file buffer is NULL terminated.
    Mhmm. I'm already adding sizeof (char) to my fsize to accommodate for the null-terminator, I'm freeing the buffer, error checking, etc.

    And, at the risk of sounding like a broken record, thanks again.

  12. #12
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    Then don't use Windows. ;D
    close the file back and open in text mode and read.
    Edit: I'm not sure what's your purpose of reading entire file in memory. Is it just the requirement?
    What if the file size is 1GB? erm still fast huh?


    The reason I want to do this is I want to use string.h for functions such as strstr which don't work with single characters (and no, I'd prefer not to write a make-shift function to make it to). It doesn't need to use realloc, so long as the whole file is loaded.
    Reading line by line, you could still use strstr(). Most *nix tools ,say awk,sed, process line by line basic.
    Last edited by Bayint Naung; 02-21-2011 at 12:22 PM.

  13. #13
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Calling realloc() this way will completely fragment the heap, since no freed block is ever large enough to contain the next required block, and both blocks are in existence simultaneously during the copy operation.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  14. #14
    Registered User
    Join Date
    Feb 2011
    Posts
    5
    My code:

    Code:
    int main () 
    {
        Read = fopen ("test", "rb");
        readfile (Read);
        fclose (Read);
        Read = fopen ("test", "r");
        readfile (Read);
        fclose (Read);
        free (output);
        return 0;
    }
    My result:

    Code:
    Hello.
    
        TESTING
     THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG.
     1234567890!@#$%^&*()`~-_+=
    Hello.
    
        TESTING
     THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG.
     1234567890!@#$%^&*()`~-_+=

    Is this what you were asking? I can't seem to really understand what you're saying.
    (I'm running Ubuntu Maverick 10.10.)

    >Calling realloc() this way will completely fragment the heap, since no freed block is ever large enough to contain the next required block, and both blocks are in existence simultaneously during the copy operation. brewbuck

    Yep, we've gone through that above. (See the conversation between me and CommonTater.)
    Last edited by Asym; 02-21-2011 at 12:22 PM. Reason: Added OS and also forgot to reply to brewbuck.

  15. #15
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Bayint Naung View Post
    I passed the buffer to parse() function and it says unknown char '\r'.
    \r and \n are the newline and carriage return characters at the end of each line.
    If you look at your file in a hex editor, you will probably find the 0x0D 0x0A sequence at the end of each line, these correspond to the \r and \n escapes in C. Typically you can use strtok() to locate the sequence and isolate each string for you... so you can parse them out one at a time.
    Last edited by CommonTater; 02-21-2011 at 12:41 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Can you help me about tolower() in file
    By nctar in forum C Programming
    Replies: 7
    Last Post: 05-12-2010, 10:04 AM
  2. opening empty file causes access violation
    By trevordunstan in forum C Programming
    Replies: 10
    Last Post: 10-21-2008, 11:19 PM
  3. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 03:53 AM
  4. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  5. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM

Tags for this Thread